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ABSTRACT 


In  many  situations,  we  are  interested  in  selection  of  important 
variables  which  are  adequate  for  prediction  under  a  logistic  regression 
model.  In  this  paper,  some  selection  procedures  based  on  the  information 
theoretic  criteria  are  proposed,  and  these  procedures  are  proved  to 
be  strongly  consistent. 


AMS  1980  Subject  Classifications.  Primary  62H12,  62H15. 


Keu  Words  and  Phrases:  Consistency,  information  theoretic  criterion, 
logistic  discrimination,  logistic  regression,  maximum  likelihood, 
model  selection. 


1.  INTRODUCTION 


Logistic  regression  is  the  most  used  form  of  binary  regression  (see 
Berkson  (1951),  Cox  (1970),  and  Efron  (1975).  The  investigation  of  this 
aspect  has  had  an  important  impact  on  disease  diagnostics  (refer  to 
Gordon  and  Kannel  (1968),  Pregibon  (1981)  and,  Stefanski  and  Carroll  (1985)). 
One  of  the  important  aspects  related  to  logistic  regression  is  logistic 
discrimination  (refer  to  J .  A.  Anderson  (1982)). 

The  model  to  be  considered  is  given  by 

PrfY  =  i ! x }  =  {1  +  exp(-  0-.y:(1)-  ...  -  px(p))-~]  (l.i) 

P  'Y  =  o  |  x }  =  i  -  P  {  y  =  1  j  x } , 

where  X1  =  (X^,.„.,X^)  is  a  p*l  random  vector. 

In  some  situations  there  are  many  potential  variables  X^'s.  This  may 
represent  the  experimenter's  lack  of  knowledge,  his  caution,  or  both.  One 
objective  of  the  statistician  must  be  to  choose  a  set  of  good  predictor 
variables  from  the  set  of  possible  variables.  A  similar  problem  may  also 
be  met  in  logistic  discrimination.  In  this  paper,  we  are  interested  in 
selection  of  important  variables  that  are  adequate  for  prediction  in  the 
regression  model  (1.1).  Using  an  information  theoretic  criterion,  we 
propose  some  selection  procedures  which  are  strongly  consistent. 

In  Section  2,  the  above  problem  is  formulated,  and  the  main  methods 
and  results  are  stated.  Some  lemmas  are  introduced  in  Section  3,  and  the 
Section  4  is  devoted  to  the  proof  of  the  theorems. 
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2.  PROBLEM  AND  MAIN  RESULTS 

Let  (X,Y)  be  a  random  vector  such  that  X  is  a  p-vector  and  Y  is 
Bernouli  variable  with 

Pr{Y  =  l|X>  =  p(Z'e)  A  (1  +  exp(-Z'3)}-1,  (2.1) 

where  6'  =  ),  Z’  =  ( 1  ,X ' )  =  (1  ,X^ , „ , .  ,X^ ) .  Assume  that  F, 

the  distribution  of  X,  satisfies  the  following  conditions: 

(i)  If  6  t  y,  then 

F{ X :  p(Z's)  f  p(Z'v)}  >  0.  (2.2) 

(ii)  E(X'X)  < 

Put  A  =  'r 0 ,1 , . . .  ,pl .  It  is  easily  seen  that,  there  exist  a  unique 
subset  Bq  of  A  such  that  ^0}«2.Bq,  and,  i  6  Bq,  i  t  0  if  and  only  if  e.  f  0. 

Call  Bq  the  best  subset  of  A  Note  that  if  6^  =0  for  some  i  G  A  -  f 0 } , 

then  Y  is  independent  of  X^. 

In  this  paper,  we  want  to  determine  the  best  subset  Bq  of  A.  To  this 
end,  suppose  that  (X^ ,Y^) ,,, .  ,(X  ,Yn)  are  iid.  observations  of  (X,Y).  A 

step-wise  selection  method  based  on  testing  a  series  of  hypotheses  is 
proposed  by  J.  A.  Anderson  (1982,  pp. 169- 191 ) .  But  it  is  difficult  to  seek 
for  the  conditional  limit  distribution  of  the  test  statistic  for  latter 
hypothesis  after  the  former  hypotheses  was  tested.  In  this  paper,  we  propose 
a  method  ba^ed  on  the  information  theoretic  criterion,  and  establish  the 
strong  consistency  of  this  method  under  some  mild  conditions 


r-.v. 


(2.3 


Let  {OKTBtlA,  Write 


?P+1. 


Mg  =  {£  6  RF  :  e.j  =  0  for  all  i  €  A  -  B} 


Let  L  (s)  be  the  likelihood  function.  Then 


log  L  (e)  =  l  [Y  log  p(Z.e)  +  (1-Y  )  log  q(Z.e)], 
n  i=1  i  -1-  i  -i- 

where  q(.)  =  1  -  p(,),  I.  =  (l,x!)  =  (l,xj^,,..,xjp))  Put 


G  (B)  =  sup  log  L  (s), 

jsmb 


and 


In(B)  =  Gn(B)  -  *(B)  Cn, 


where  C  satisfies  the  following  conditions: 
n  3 


iim  C  /n  =  0  and  lim  C  /loglog  n  =  «>, 
n  n 

rn^o  n-**> 


Choose  B  such  that  (OlcBCA  and 


I„(B) 


max  1(B), 


(2.4 


(2.! 


(2.6 


(2.7 


(2.6 


B:{0K-BCA' 

a 

and  use  B  as  an  estimate  of  the  best  subset  Bg  of  A,  We  have  the  following 


THEOREM  2.1.  Under  the  condition  (2.2),  B  is  a  strongly  constant 
estimate  of  the  best  subset  Bg  of  A  =  {0,1,2, ... ,p). 

Note  that  the  above  consistency  means  that  with  probability  one  for  n 

A 

large,  B  coincides  with  the  best  subset  of  A. 

For  simplicity  of  calculation,  we  can  use  another  alternative  method. 
To  this  end,  put 

A^  ^  =  A  -  {i } ,  i  =  1 ,2,. , .  ,p. 

There  is  one  subset,  either  A  or  a(0,  written  as  which  satisfies 

In(B(i))  =  max(In(A),  In(A(i))},  i  =  1 . . 


(2, 
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Put  p 

B  =  I  "V1*, 
i=l 

A 

then  we  can  use  B  as  an  estimate  of  the  best  subset  of  A.  In  the  same 
way,  we  have 

THEOREM  2.2.  Under  the  condition  (2.2),  B  is  strongly  consistent 
In  the  following  sections,  we  will  only  give  a  proof  for  the  theorem  2.1. 
proof  of  the  theorem  2.2  is  similar  and  is  omitted. 


3.  ASYMPOTOTIC  EXPANSION  OF  SOME  STATISTICS 


Now  we  assume  that  6  =  (3q,,..,S  )  is  the  true  parameter.  Put 

l  log  Ln(y)  =  ^  ^  [Y.  log  p(zjy)  +  (1-Y.)  log  q(zjv)]  (3.1) 

H(y )  =  J[P(Z'fi)  log  P(z'y)  +  q(z' 6)  log  q(Z'y)]dF 

(3.2) 

Hn(y)  =  ^  J  [P(z!b)  p(^-y)  +  q(z!e)  log  q (z!,)] 

Since  !  log  p(u)  j  <_2  +  ■  u J ,  log  q(u)  <2  +  [»'  for  any  rea’  u,  H(% )  is 

finite  for  any  y  6  Rp+1.  For  fixed  3,  functions  ilogL  (y),  H  (y)  and 
_  ~  n  n  -  n  •» 

H(y)  are  all  concave  in  y. 

We  need  the  following  lemmas: 

LEMMA  3.L  Let  E  be  an  open  convex  subset  of  Rp  and  let  f^.f^,...,  be 

a  sequence  of  concave  functions  such  that  V  x  G  E,  fn(x)  -*■  f(x)  as  n  -  ®, 

where  f  is  some  real  function  on  E.  Then  f  is  also  concave  and  for  all 
compact  Dc'lE, 

sup  If  (x)  -  f(x)|  -  0  as  n  ■»  », 
x€D  n 

LEMMA  3.2.  Suppose  that  ;fnl  and  f  satisfy  the  conditions  of  the  above 

A  A 

lemma,  and  f  has  a  unique  maximum  at  x  G  E.  Let  x„  maximize  f  .  Then 

^  n  n 

A  A 

x  -»  x  as  n  -» 
n 

For  a  proof  of  the  above  two  lemmas,  the  reader  is  referred  to 
Rockafellar  (1970,  Theorem  10.8),  P.  K.  Anderson  and  R.  D.  Gill  (1982, 
Theorem  11,1,  Corollary  1 1.2). 
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LEMMA  3,3.  Let  ep  be  a  maximum  likelihood  estimate  of  s,  If  (f)  0f  (2.2) 
and  the  following  condition  are  satisfied: 

E  j|Xi!  <  »  with  !]X||  =  (X'X)1/2.  (3  3) 

Then, 

lim  Bn  =  8  a.s. ,  1im  ^lo9Ln(sn)  =  H(3)  a.s  (3.4) 

n-«D 

Proof .  By  Jensen's  inequality, 

H(y)  1  H( s )  for  any  y  £  Rp+1 

I 

and  the  equality  holds  if 

F(X  :  P(Z’y)  «  P(Z'e))  =  1 

Thus,  by  the  condition  (i)  of  (2.2),  H{y)  has  a  unique  maximum  at  •  . 

A  1 

Now  let  ^  maximize  -logln(y).  By  (3.3)  and  the  strong  law  of  large  numbers  (SI 


« **> 
a 


1 im  ^logL  ( .)  =  H(y)  a.s 
M  11  n  •»  % 

n-^° 


(3.5) 


)P+1 


for  any  y  G  ,  and  (3.4)  follows  from  Lemmas  3.1  and  (3.2), 

A 

Note  that  6n  satisfies  the  likelihood  equation,  that  is 

1 


j-  )  [y.-pU.b  )]z.  =  0 


(3.6) 


We  have  the  following  lemma, 


LEMMA  3.4.  Suppose  that  the  conditions  of  (2.2)  are  satisfied,  then  with 
probability  one  for  n  large, 


,-l 


1 


»„  -  S  ■  S(s>  n*o(i))£  l  (v.-pfSfSflJ, 

1  =1  '  ' 


where 


S( 


y)  =J  p(Z'y 


y)q(Z  1 >)ZZ'dF  >  0. 


(3.7) 


(3.8) 


From  this,  6n  -  6  obeys  the  law  of  iterated  logarithm,  t.e. 


3p  -  e  =  0(/  ^1  oglogn)  a.s. 


Proof.  At  first  we  show  that  S(y)  >  0.  Otherwise,  there  exists  some 


constant  (p+1)  -  vector  C  f  0  such  that  E(C'2r  p(Z'%)q(Z'Y)  =  0,  i.e.. 


F{ X  :  C'Z  =  0}  =  1, 


which  implies 


C  t  0  and  F{ X  :  p(Z'C)  =  p(Z'O)}  =  1. 


This  contradicts  to  the  condition  (i)  of  (2.2). 

Put  fj(u)  =  31ogp(u)  +  3q(u),  f2(u)  =  fj(u)  -  p(u)q(u), 

Vi>  -\  j, 

**  i  P  i  i 

sn  <;>  -K  h 

o. 

S*(y)  =  IfjU'yJZZ'dF, 

S**(yi  = Jf2(Z'y)ZZ,dF. 

It  is  easily  seen  that,  the  above  four  functions  are  all  concave  functions 
of  y.  Under  (ii)  of  (2.2),  by  SUN, 

lim  S*(y)  =  S*(y)  a.s.  (3. 

n-x» 

lim  s!*(y)  =  S**(y)  a.s. 

II  «  *» 

n-*« 


ss 


LEMMA  3.5.  Define  H  (y)  and  Hi. 


.  2) .  Under  conditions 


(i)  and  (ii)rof  (2.2),  we  have 


as  n  -  where  S(y)  is  defined  by  (3.8). 


By  (3.2), 


?H  ,  n 
n  _  1  c 


)  a.s. 


I  I  I 


which  implies 


-r  =  i  .HP(2iS)q(Ziy)  -  e)p(Z.y)]Zi 


'  h  1  n  1  1  1 

t#  *4,^  *  -Vi>- 


•  H 

— £(  =  )  =  0. 

'y  \ 


By  the  Taylor  expansion. 


Hn<=n>  -  *  Kvf>' W(.3*><.VS> 


1  *  ★  * 

••&,**>  V* 

■^r  **  "•  * 

where  =  \3  +  (!-'*) fe  for  some  '  €(0,1).  Similar  to  (3.15),  we  have 


1 im  Sp(e  )  =  S(a)  -  0  a.s. 

n-*oc 

as  n  -  oo.  The  lemma  follows  from  (3.19)  and  (3  15') 


LEMMA  3.6.  Under  the  conditions  (i)  and  (ii)  of  (2.2),  we  have 

>9Ln<V  4.nvrp(M)]v  *  «„<:> 


w> 


ft' 


C 


as  n  -  s  where. 


S(B)  -  p(Z'g)q(Z'6)ZZ'dF  >  0. 


and  H  (y)  is  defined  in  (3.2). 


?voof.  By  (3.1),  (3.2)  and  (2.1), 


iiogLn(jn)  -  Hn(jn) .  i  .ii(vrp(2;!))?:§n 


l  [Y.-p(Z*  S)]z! 6 
i-i  1  -1*- 


^  j,[vi-p(?if)]?i(Vf)' 

By  (3,14),  (3,15), 

=  (Vf),S(?)(!n\s)  +  °(  Z)  a,s' 

■  n  -  (3.20)  follows  from  Lemma  3.5  and  (3.22),  (3.23) 


'<ow  we  take 


n 


Write 


^  iI1[V11o9p(5BllB>  +  U-YpiogqU^)! 
"<V  =  J[p(Z  6)1ogp(2gYB)  +  q(Z's)logq(2BYB)]dF, 


(3  >24) 


where 


p(u)  = 


1+e 


-u  * 


q(u)  =  l-p(u) 


Functions  -logL  (yd)  and  H(yd)  are  all  concave  functions  on  Rp. 
n  n  -  d  «  d 


It:  ■  J 6 >p<?b:b>]?b  dF- 

”0 


32H 


3Ib3Zb 


=  "J^^BlB^^BlB^Bfs  dF- 


Further 


(3,25) 


(3.26) 


Similar  to  the  argument  used  in  establishing  (3.8),  by  (i)  of  (2.2)  we  have 

(3.27) 


\  52H  n 

lB  "*ZB*Zb  ”  ’ 


Thus,  H( Yg)  is  strictly  concave.  Since 


H(yr)  <  f[p(Z'6)logp(Z'B)  +  q(Z ' B)logq(Z' 3)]dF  <  »,  (3,28) 

-ts  —  j  ~  -  -  -  ~  -  ~ 

**  ★ 

H(y g)  has  a  unique  maximum  at  some  Yg  . 

a  I  —  n 

Assume  that  yd  maximizes  -logL  (y0).  By  SLLN,  for  any  yr  6  R  , 

«*d  n  n  —  o  •» n 


1  **  «* 

lim  — logLn(Yg)  =  H(yb)  a.s. 
n-x® 

By  Lemrias3.1  and  3.2,  for  any  compact  DC_1RP, 


(3.29) 


sup  -logL  (Yr)  -  H(yr) 
Tb6D  n  n  -B  B 


a.s.  as  n  -+ 


(3,30) 


From  (3.30)  and  (3.31),  it  follows  that 


Urn  iWn(Tj)  -  H(y*) 

ft -XX)  ** 


a,s, 


Similar  to  the  argument  used  in  the  beginning  of  the  proof  of  Lemma  3.3, 
we  get  the  following 


LEMMA  3.7,  Suppose  that  3  =  ( Sq ,8^ , , . , ,e  )  is  the  true  parameter, 
6-  f  0  and  B  =  {0,2 ,3, , , . ,pl.  Define  Gp(B)  by  (2.5),  Then,  under  the 

conditions  (i)  and  (ii)  of  (2.2),  we  have 


lim  i  G  (B)  a*_Sj  H(y  *)  <  H($), 


( 


B 


4,  THE  PROOF  OF  THE  THEOREMS 


In  the  following,  we  only  give  the  proof  of  the  theorem  2,1.  The  proof 
of  the  theorem  2  2  is  similar. 


Assume  that  £  is  the  true  parameter  and  Bq  is  the  best  subset  of  A, 


{0}^V 


For  any  B  A,  B  =  { j0,jj .... ,3S>,  where  J0=0<31<  ...  <  j  ;  put 


(j't)  (jJ 


(ji)  ( J*  ) 


X&  =  (X  »  •••»x  XBi  =  ^Xi  . X^  ^ 


6r  "  ( 3n,£  ■  6  \ 

JS 


v  _  /  y  y  y  \ 

"B  ~  0’  jj  ”  *  *  *  jj’ 


(4,1) 


ZB  =  ( 1  *  }  ’ 


?Bi  =  (1,^Bi^’ 


and 


denote  by  Fg  the  distribution  of  Xg.  It  is  easily  seen  that,  if 


3g  f  y B  i  then 


V?b  :  p(?b!b)  ^  p^bIb^}  >  0> 


(4.2) 


Mow  assume  that  Bq<-Bc'_A,  B  f  Bq,  then  #(B)  >  "(Bq),  By  (4.2)  and 


(ii)  of  (2.2),  using  Lemmas  3.4  and  3.6,  we  have 


G  (B  )  =  n  W  n( 6R  )  +  n  H  R  (br  )  +  O(loglogn)  a.s.  , 
n  o  n>%“Bo  n’Bo  ~B0 


(4.3) 


Gn(8>  ’  "  “n.B^B1  +  "  Hn,B(V  *  °<'oglogn) 


a.s. 


Wn,B^B^  =  n  ,  L  ^VP^BifB^BifB, 


where 


(4.4) 


Hn,B(-6B}  =  n  14Cp(?Bi!B)1ogp(?Bi!B)  + 
+  q{!Bi!B)logq(?Bi!B)]’ 


Since  si  =  0  for  i  6  Bg,  we  have 


Wn,B(^  =  Wn,B0^B0^  Vb^B5  =  Hn,Bg^Bg^ 


(4.5) 


By  (2  6),  (2.7),  (4  3)  and  (4.5),  with  probability  one  for  n  large, 


W  -  <„<6>iW  -Gn<B>  tCn 


=  O(loglogn)  +  Cn  >  0. 


(4.6) 


Further,  using  (4.5)  and  Lemma  3.3,  we  have 


lim  -i-  Gn(BQ)  =  limlGn(A)  =  H(6)  a.s. 
n-~°  n-*** 


(4.7) 


Now  we  assume  that  ;CLcLBCIA  and  there  exists  some  integer  i  such  that 
i  €  Bn  and  i  6  B.  Without  loss  of  generality,  we  can  assume  that  i  =  1, 


By  Lemma  3.7,  we  have 


Bj  =  (0,2,3, „ , , ,pl. 


lim  sup  ^  Gn(B)  i  lim  ^  G  (Bj) 

n-x» 


=  H(Y*  )  <  H(8)  a.s. 
-C1 


(4.8) 


maximizes  H(yd  ).  By  (4.7', ,  (4.8)  and  lim  C/n  =  0,  with 


where 
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